feat(ce,flce): decouple gradients computation for no_grad mode #894

Tcc0403 · 2025-10-01T13:39:44Z

Summary

Add a flag HAS_GRADIENTS to cross entropy kernel. No more gradients computation if there's no need.

Testing Done

Cross Entropy forward with no_grad

Fused Linear Cross Entropy forward with no_grad

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Signed-off-by: Tcc0403 <[email protected]>

shimizust

Nice, thanks for adding this. The forward flce still is significantly slower than hf since we're still computing grad_input applying token scaling logic? Also do you know why fp32 accum is faster?

Tcc0403 · 2025-10-11T19:16:10Z

@shimizust The slower forward pass is kinda expected because:

instead of one big matmul, we slice input rowwise and do multiple matmuls. It brings some kernel launch overhead. Plus, it's not gauranteed to ivnoke the most efficient kernels (tiling size, tail effect, etc... from kernel's prespective).
multiple cross entropy kernel launch overhead and similar issue also add up.

I just found that we can remove this line in eval mode as well, cutting another matmul for each interation should be significant.

Liger-Kernel/src/liger_kernel/ops/fused_linear_cross_entropy.py

Line 175 in 5c2a04d

grad_input[start_idx:end_idx] = grad_logits_chunk @ weight

and some grad tensors allocations too

## Summary  follow-up #894  ## Testing Done   - Hardware Type: <BLANK> - [ ] run `make test` to ensure correctness - [ ] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence --------- Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 changed the title ~~feat(ce,-flce): decouple gradients computation for no_grad mode~~ feat(ce,flce): decouple gradients computation for no_grad mode Oct 1, 2025

feat(ce,flce): decouple gradients computation for no_grad mode

9ab603a

Tcc0403 force-pushed the tcc/flce-eval-no-grad branch from 291141c to 9ab603a Compare October 1, 2025 13:40

Tcc0403 added 4 commits October 1, 2025 21:41

format

8761844

Signed-off-by: Tcc0403 <[email protected]>

fix grad calc condition

f1f525b

Signed-off-by: Tcc0403 <[email protected]>

fix ce test

bd78d85

Signed-off-by: Tcc0403 <[email protected]>

check _input instead of _input_chunk

4fa3179

Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 mentioned this pull request Oct 3, 2025

fix(flce): add shift_labels as eval mode loss condition #888

Merged

3 tasks

Remove benchmarking no-grad-full mode

7a53bcd

Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 requested review from momochen and shimizust October 8, 2025 10:53

Keep tested module only

98de316

Signed-off-by: Tcc0403 <[email protected]>

Tcc0403 mentioned this pull request Oct 8, 2025

fix(flce): Remove unused instances in benchmarking script #898

Closed

3 tasks

lancerts added 2 commits October 10, 2025 14:30

Merge branch 'main' into tcc/flce-eval-no-grad

84bd4ec

Merge branch 'main' into tcc/flce-eval-no-grad

950f9d1

shimizust approved these changes Oct 11, 2025

View reviewed changes

Merge branch 'main' into tcc/flce-eval-no-grad

1da6151

lancerts merged commit 5c2a04d into main Oct 11, 2025
2 checks passed

lancerts deleted the tcc/flce-eval-no-grad branch October 11, 2025 15:44

Tcc0403 mentioned this pull request Oct 11, 2025

Filter out redundant ops/allocations in no_grad mode #906

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ce,flce): decouple gradients computation for no_grad mode #894

feat(ce,flce): decouple gradients computation for no_grad mode #894

Uh oh!

Tcc0403 commented Oct 1, 2025 •

edited

Loading

Uh oh!

shimizust left a comment

Uh oh!

Uh oh!

Tcc0403 commented Oct 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(ce,flce): decouple gradients computation for no_grad mode #894

feat(ce,flce): decouple gradients computation for no_grad mode #894

Uh oh!

Conversation

Tcc0403 commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

shimizust left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tcc0403 commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tcc0403 commented Oct 1, 2025 •

edited

Loading

Tcc0403 commented Oct 11, 2025 •

edited

Loading